Alignment-Based Discriminative String Similarity
نویسندگان
چکیده
A character-based measure of similarity is an important component of many natural language processing systems, including approaches to transliteration, coreference, word alignment, spelling correction, and the identification of cognates in related vocabularies. We propose an alignment-based discriminative framework for string similarity. We gather features from substring pairs consistent with a character-based alignment of the two strings. This approach achieves exceptional performance; on nine separate cognate identification experiments using six language pairs, we more than double the precision of traditional orthographic measures like Longest Common Subsequence Ratio and Dice’s Coefficient. We also show strong improvements over other recent discriminative and heuristic similarity functions.
منابع مشابه
String Similarity Metrics for Ontology Alignment
Ontology alignment is an important part of enabling the semantic web to reach its full potential. The vast majority of ontology alignment systems use one or more string similarity metrics, but often the choice of which metrics to use is not given much attention. In this work we evaluate a wide range of such metrics, along with string preprocessing strategies such as removing stop words and cons...
متن کاملFCICU at SemEval-2017 Task 1: Sense-Based Language Independent Semantic Textual Similarity Approach
This paper describes FCICU team systems that participated in SemEval-2017 Semantic Textual Similarity task (Task1) for monolingual and cross-lingual sentence pairs. A sense-based language independent textual similarity approach is presented, in which a proposed alignment similarity method coupled with new usage of a semantic network (BabelNet) is used. Additionally, a previously proposed integr...
متن کاملSimilarity matching of continuous melody contours for humming querying of melody databases
Music query-by-humming is a challenging problem since the humming query inevitably contains much variation and inaccuracy. Many of the previous methods, which adopt note segmentation and string matching with dynamic programming, suffer drastically from the errors in the note segmentation which affects retrieval accuracy and efficiency. In this paper, we present a novel melody similarity matchin...
متن کاملA Comparative Evaluation of String Similarity Metrics for Ontology Alignment ?
Ontology alignment is regarded as the most perspective way to achieve semantic interoperability among heterogeneous data. The majority of state of art ontology alignment systems used one or more string similarity metrics, while the performance of these metrics were not given much attention. In this paper we first analyze naming variations in competing ontologies, then we evaluate a wide range o...
متن کاملGrammar string: a novel ncRNA secondary structure representation
Multiple ncRNA alignment has important applications in homologous ncRNA consensus structure derivation, novel ncRNA identification, and known ncRNA classification. As many ncRNAs’ functions are determined by both their sequences and secondary structures, accurate ncRNA alignment algorithms must maximize both sequence and structural similarity simultaneously, incurring high computational cost. F...
متن کامل